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Binary format for MPEG-7 instances 



The present invention concerns an encoding method for encoding a description 
element of an instance of an XML-like schema defining a hierarchical structure of 
description elements, said hierarchical structure comprising hierarchical levels, parent 
description elements and child description elements, said description element to be encoded 
5 comprising a content. 

It also concerns a decoding method for decoding a fragment comprising a 
content and a sequence of identification information. 

It also concerns a encoder intended for implementing said encoding method, a 
J:-J decoder intended for implementing said decoding method, and a transmission system 
RIO comprising such a encoder and/or such a decoder. 

rn It further concerns a table intended to be used in such an encoding or decoding 

%¥ method and a signal transporting encoded description elements generated by using such an 
lik encoding method. 

TT The invention is applicable to XML-like instances of XML- like schema. In 

H 5 particular it is applicable to MPEG-7 documents. 

H XML is a recommendation of the W3C consortium (extensible Markup 

Language 1 .0 dated October 6, 2000). XML-schema is also a recommendation of the W3C 
consortium. An XML-schema defines a hierarchical structure of description elements (called 
element or attribute in the W3C recommendation). An instance of an XML-schema 
20 comprises description elements structured as defined in said XML-schema. 

An object of the invention is to propose an encoding and a decoding method 
for transmitting and storing one or more description element(s) of an XML-like document 
which is an instance of an XML-like schema. 

According to the invention an encoding method as described in the 
25 introductory paragraph is characterized in that it consists in: 

using at least one table derived from said schema, said table containing 
identification information for solely identifying each description element in a hierarchical 
level, and structural information for retrieving any child description element from its parent 
description element, 
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scanning a hierarchical memory representation of said instance from parent 
description elements to child description elements until reaching the description element to 
be encoded, and retrieving the identification information of each scanned description 
element, 

encoding said description element to be encoded as a fragment comprising 
said content and a sequence of the retrieved identification information. 

When a description element is defined in the schema as possibly having 
multiple occurrences, said table further comprises for said description element an occurrence 
information for indicating that said description element may have multiple occurrences in an 
instance, and when an occurrence having a given rank is scanned during the encoding, the 
corresponding retrieved identification information is indexed with said rank. 

And a decoding method according to the invention as described in the 
introductory paragraph is characterized in that it consists in: 

using at least one table derived from an XML-like schema, said schema 
defining a hierarchical structure of description elements comprising hierarchical levels, 
parent description elements and child description elements, said table containing 
identification information for solely identifying each description element in a hierarchical 
level, and structural information for retrieving any child description element from its parent 
description element, 

scanning said sequence identification information by identification 

information, 

at each step searching in said table for the description element associated to 
the current identification information and adding said description element to a hierarchical 
memory representation of an instance of said schema if not already contained in said 
hierarchical memory representation, 

adding said content to the description element of said hierarchical memory 
representation that is associated to the last identification information of said sequence. 

When a description element is defined in the schema as possibly having 
multiple occurrences, said table further comprises for said description element an occurrence 
information for indicating that said description element may have multiple occurrences in an 
instance, and when said sequence comprises an indexed identification information, said index 
is interpreted as an occurrence rank for the associated description element, same description 
element(s) of lower rank(s) being added to said hierarchical memory representation if not 
already contained in it. 
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According to the invention each description element is represented by an 
independent fragment in the stream ensuring random-access to elements and attributes as 
well as a high level of flexibility as far as the incremental transfer is concerned. This 
fragment approach also takes into account the fundamental flexible and extensible nature of 
5 MPEG-7 by using schemas to compute the sequence of identification information associated 
to a given description element. The fragment approach allows the proposed binary format to 
fulfil the following properties: 

Random access to instance elements and attributes 
Incremental non-ordered and scalable transfer. 
10 - Compactness : only elements and attributes that have a primitive type content 

are coded. 

Easy integration with instance update protocol. 
Q - Easy parsing and partial instantiation of binary MPEG7 descriptions, 

n The other advantages of the invention are captured by the use of an 

H5 intermediate representation of the schema. Indeed, the table which is directly and 

II 

Fl unambiguously generated from the schema, allows to share a common knowledge about the 

possible valid instances between a server and a client, in a form dedicated to the binary 
2 encoding and decoding of these instances. This common knowledge, gathering information 
4 such as structure, type, and tag name of the elements and attributes, does not need to be sent 
^0 to the client, which leads to an efficient schema-aware encoding of the instances. This allows 
4 also the binary format to achieve a full extensibility support for future schemas defined inside 
or outside MPEG-7. 

Further features and advantages of the invention will become more readily 
apparent from the following detailed description, which specifies and shows a preferred 
25 embodiment of the invention in which: 

Fig.l is a schematic representation of a transmission system according to the 

invention, 

Fig. 2 is a diagram describing the steps of a coding method according to the 

invention, 

30 Fig.3 is a diagram describing the steps of a decoding method according to the 

invention, 

Fig. 4 is a fragment embodied in a signal according to the invention, 
Fig.5 is an example of binary encoding of an instance compact key, 
Fig.6 is an example of binary encoding of the value of a description element. 



. PHFR000110 



05.10.2001 



The invention will now be described by reference to XML instances of XML- 
schemas. This is not restrictive. The invention is applicable to any instances and schemas 
5 written in Markup Language of the same type. 

An XML-like schema defines a hierarchical structure of description elements 
(either an element or an attribute in the XML terminology) comprising parent description 
elements and child description elements. An instance of an XML-like schema is an XML-like 
document comprising description elements structured as defined in said XML-like schema. 
10 Some of the description elements of an instance have a content. Other are only structural 
containers. 

e ~ As described in Fig.l, a transmission system according to the invention 

*J3 comprises an encoder BiM-C located at the transmission side and a decoder BiM-D located at 

CO the reception side. Both the encoder BiM-C and the decoder BiM-D have access to an XML- 

l:A 5 schema XML-S (the XML-schema is either available locally or downloaded). 

0} They also have access to at least one table EDT, called Element Declaration 

e Table, directly and unambiguously generated from the XML-schema. The Element 

Declaration Table is primarily intended to contain all the information needed to encode and 
H decode any instance that is valid with respect to a given schema definition. The Element 
r |0 Declaration Table is generated once and available for coding and decoding an instance that 
H refers to the associated schema. It doesn't have to be sent to the client. 

The encoder scans a hierarchical memory representation DM-C of an instance 
XML-C (a DOM representation as defined in the W3C specification « Document Object 
Model, level 1 specification, versionl.O, October 1, 1998», or any other hierarchical memory 
25 representation of the instance) and uses the information contained in the Element Declaration 
Table in order to generate one or more binary fragments BiM-F each binary fragment being 
associated to a description element of the instance. 

According to the invention, the description elements that have a primitive type 
content (e.g. built-in type, simple type, a descriptor with its own binary representation) are 
30 encoded as an independent fragment composed of a sequence of identification information 

(also called instance structuring key) and a content value. The description elements within the 
XML hierarchy that are only structural containers (i.e. having no content value) are not 
transmitted but inferred at the decoder side from the Element Declaration Table. 
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The binary fragments BiM-F are transmitted over a transmission network NET 
and received by the decoder BiM-D. The decoder uses the Element Declaration Table in 
order to retrieve: 

all the parent structural description elements, 
5 - the description element nature (element or attribute), 

the description element name, 

the description element type in order to decode the content value. 
The decoder BiM-D updates accordingly a hierarchical memory representation 
DM-D, An XML instance XML-D is then generated from the updated hierarchical memory 
1 0 representation. 

One can see the Element Declaration Table as an exhaustive definition of the 
possible valid instances, generated uniquely and unambiguously from the schema by 
0 developing the element and attribute declaration structures. Indeed, the XML-schema gives 
^ mainly two kinds of information : On the one hand, the location of all the possible elements 
51 5 and attributes within the XML instance hierarchy is specified by means of complex type 
;fl definitions (either named or anonymous) and element declarations. On the other hand, the 

type of their value is given through the use of built-in datatypes and simple type definitions, 
j For each element or attribute that is specified in the schema and that can be found in the 
4 instance, the Element Declaration Table gathers its name (e.g. the tag name for an element), 
40 its type, its nature (element or attribute) and a key (called table structuring key) specifying 
£ unambiguously its location within the hierarchical XML structure. While the schema is 

defining what an instance should look like for validation and interoperability purpose, the 
Element Declaration Table is stating what an instance will look like from a structural 
perspective for coding purpose. 
25 The basics of the Element Declaration Table and its use in the encoding and 

decoding process stand in the table structuring key, intended to uniquely identify: 
the type and name of the description element being transmitted, 
its location in the XML instance hierarchy. 

The syntax of this structuring key is a dotted notation where the dots denote 
30 hierarchy levels and the numbering at each level is performed by expanding all the elements 
and attributes declarations from the schema. The last digit of the notation is an identification 
information solely identifying a description element in its hierarchical level. The previous 
digits are pointing information used for retrieving a child description element from its parent 
description element. 
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When a description element is defined in the schema as having or possibly 
having multiple occurrences, an occurrence information is added at the end of the dotted 
notation (in the following of the description the occurrence information is represented by 
brackets). 

5 The process of generating the Element Declaration Table is comparable to 

browse through all the element declarations in the schema in order to come up with a 
hierarchical memory representation of the biggest instance (the one instantiating all possible 
elements and attributes) corresponding to a given schema. Nevertheless, this "biggest" 
instance is infinite as soon as the schema defines self-embedding structures, commonly used 
10 within MPEG-7. Hence, there is a clear need for capturing the self-containment in the 

Element Declaration Table. This is done by specifying, in case of a self-contained description 
G element, the table structuring key of its ancestor in the tree structure that has the same 
% n complex type. Such an element is thus not expanded further in the Element Declaration 

Table. The table structuring key of the ancestor is called self-containment key. It is also used 
f 15 for retrieving a child description element from its parent description element. 
, n The pointing information together with the self-containment key are the 

J* structural information used to retrieve any child description element from its parent 
C3 description element. When a parent description element is a self-contained description 
I ! element, its children are the description elements which pointing information are identical to 
ClO the self-contained key of said parent description element. When a parent is not a self- 

contained description element, its children description elements are the description elements 
which pointing information are identical to said parent table structuring key. 

The Element Declaration Table allows to state a unique and unambiguous 
numbering of all possible instances of the schema. We will now give examples of schemas 
25 and corresponding Element Declaration Table. 
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EXAMPLE 1 
Schema 1: 

<complexType name= "complexTypel"> 
<sequence> 

<element name="Elementl" type="typerV> 
<element name="Element2" type="type2" minOccurs="0" maxOccurs='\mbounded'7> 
<element name="Element3"> 
<complexType> 

<sequence> 

<element name-"Element4" type- 'type4" minOccurs- '0" 

maxOccurs-T7> 

<element name- 'Element 1" type="typel'7> 
</sequence> 
</complexType> 

</element> 

</sequence> 

<attribute name- 'Attribute 1" type="type4'V> 
</complexType> 

<element name="GlobalElement" type="complexTyper'/> 

The Element Declaration Table, seen as a development of all schema element 
declarations, would contain among other information the following element names together 
with their corresponding table structuring key: 



Table 1: 

Name Table structuring key (...) 

GlobalElement 0 

Element 1 0 . 0 

Element2 0 . 1 [] 

Element3 0 . 2 

Element4 0.2.0 

Elementl 0.2.1 

Attribute 1 0 . 3 



The underlined digits are the identification information. 
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EXAMPLE 2 
Schema 2: 

<complexType name="complexTypel"> 
<sequence> 

<element name- 'Elementl" type- 'complexType2"/> 
<element name- 'Element2" type- 'type2" minOccurs="0" maxOccurs- 'unbounded'7> 
<element name="Element3" type- 'type3"/> 
</sequence> 

<attribute name- 'Attribute 1" type="type4"/> 
</complexType> 

<complexType name- , complexType2"> 

<sequence> 
<element name- 'Element4" type- 'type4'7> 

<element name^'Elementl" type="complexType2"/> 

</sequence> 
</complexType> 

<element name- 'GlobalElement" type- 'complexTyperY> 

The Element Declaration Table contains, among other information such as the 
name and key of the elements, the self-containment field when relevant: 



Table2: 

Name Table structuring key Self-containment key (...) 

GlobalElement 0 

Elementl 0 . 0 «- ( 

t 

Element4 0 . 0. 0 

Elementl 0.0.1 0.0 

Element2 0 . 1 [] 

Element3 0 . 2 

Attribute 1 0 . 3 



The underlined digits of the table structuring key are the identification 
information. The non-underlined digits of the table structuring key and the self-containment 
key are the structural information used to retrieve any child description element from its 
parent description element. 
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Note that the brackets in the Element2 table structuring key denote the 
presence of a multiple occurrence element. Moreover, Element2 and Element4 are taken into 
account in the numbering even though they are optional elements. Note also that the 
Element 1 appears twice in the table since it can be instantiated at different locations within 
the tree structure. 
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EXAMPLE 3 
Schema: 

<?xinl version="L0" encoding= M UTF-8"?> 
<schema xmlns:xsd- 'http://www.w3 .org/1 999/XMLSchema"> 
<complexType name-'MediaTimeType" content^ M elementOnly"> 
<sequence> 

<element name="Start M > 

<simpleType base- 'integer"^ 
</element> 

<element name="Stop"> 

<simpleType base- 'integer 7> 
</element> 
</sequence> 

<attribute name="timeunit" type-'string" use- 'required 7> 
</complexType> 

<complexType name- 'VideoSegmentType" content= n elementOnly M > 
<sequence> 

<element name- 'keyFrame" minOccurs- T' 
maxOccurs- ! unbounded M > 

<simpleType base- 'string7> 
</element> 

<element name- 'Annotation" type- 'string" minOccurs- '0" 

maxOccurs= H 1 7> 

<element name="MediaTime" type= M MediaTimeType" 
minOccurs-'O" maxOccurs- ' 1 7> 

<element name- 'VideoSegment" type- 'VideoSegmentType" 
minOccurs="0" maxOccurs- ( unbounded7> 
</sequence> 

<attribute name="id" use="required"> 
<simpleType base- 'string7> 
</attribute> 
</complexType> 

<element name- 'VideoSegment" type- ' VideoSegmentType7> 
</schema> 
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Table 3: 






Name 


Table structuring key 


Self-containment key 


VideoSegment 0 


4 


! 


KeyFrame 


0.0[] 


j 


Annotation 


0.1 


] 


MediaTime 


0.2 




Start 


0.2.0 




Stop 


0.2. 1 




Timeunit 


0.2.2 




VideoSegment 


0.3[] 


0 


Id 


0.4 





%S The underlined digits of the table structuring key are the identification 

fS information. The non-underlined digits of the table structuring key and the self-containment 
Ljj* 1 5 key are the structural information used to retrieve any child description element from its 
01 parent description element. 

'~ A method for encoding a description element of an instance of a schema will 

l2 now be described by reference to fig.2. According to fig.2 in order to encode a description 
%vk element DE of an instance XML-C of a schema XML-S, the hierarchical memory 
r 2 s 20 representation DM-C of the instance XML-C is scanned from parent to child description 
H element until reaching the description element DE to be encoded (step 2-1). At each 

hierarchical level, the identification information ID* associated to the scanned description 
element Di is retrieved from the table EDT that is associated to the schema XML-S (step 2- 
2). The instance structuring key K(DE) of the description element DE is built as a sequence 
25 of the retrieved identification information IDi (step 2-3). The fragment BiM-F(DE) is finally 
built by adding the content C(DE) of the description element DE to the sequence of retrieved 
identification information (step 2-4). The fragment is converted in binary format for 
transmission. 

An example of such an encoding process will now be given in reference to the 
30 above described EXAMPLE 3, 

ARRAY 1 below gives an example of an instance of the schema described in 
EXAMPLE 3. On the left are given the instance structuring key of the element defined in the 
corresponding line of the array. On the right are given the instance structuring key of the 
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attribute defined in the corresponding line of the array. Those instance structuring keys have 
been obtained by using the above described encoding method. 

The encoding of the description element <MediaTime timeunit : ="PTlN30F ,, > 
appearing in bold character in Array 1 will now be described step by step as illustrative 
5 purpose. 

step 1-1: a hierarchical memory representation of the instance is scanned from 
parent description element to child description element until reaching the description element 
to be encoded (here the attribute "timeUnit" of an element "MediaTime"); the scanned 
description elements are: 
10 VideoSegment 

VideoSegment (first occurring child of a VideoSegment) 

VideoSegment (second occurring child of a VideoSegment) 

O MediaTime 

^y 

Vy timeUnit 

1 5 step 1-2: the corresponding identification information (including the index if 



!u applicable) are retrieved from Table 3: 

Cm 

% q VideoSegment 0 

f VideoSegment (first occurring child of a videoSegment) 3 [0] 

13 VideoSegment (second occurring child of videoSegment) ... 3 [ 1 ] 

C£0 MediaTime 2 

O TimeUnit 2 



Step 2: a sequence of the retrieved identification information is built: 0.3 [0].3 
[1].2.2. This sequence is the instance structuring key associated to the encoded description 
element. 

25 The other instance structuring keys given in Array 1 can be derived in the 

same way. 

The instance structuring key can also be seen as an instantiation of the table 
structuring key. Indeed, the multiple occurrence elements are actually indexed (resulting in 
instance structuring keys such as 0.3[0], 0.3[1], ..,) and the self-containment loops are 
30 developed (resulting in instance structuring keys such as 0.3[0].3[1].2.2 that do not appear in 
the table but can be computed from it). The instance structuring key is encoded as a 
description element identifier in an instance binary fragment. 

A method for decoding a fragment will now be described by reference to 
figure 3. According to figure 3, a decoding method according to the invention consists in: 
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step 3-1 : finding in the table the description element associated to the received 
sequence of identification information, 

step 3-2: decoding the received content according to the primitive type of said 
description element (found in the table), 
5 - step 3-3 : updating the hierarchical memory representation by adding said 

element together with its content; adding its parent description element if they are missing; 
and in case of multiple occurrences, adding same description elements of lower rank if they 
are missing. 
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Element 




Attribute 


Instance 




Instance 


Structuring 


Instance 


Structuring Key 


ivey 








</xml version— l.u encodings Ulr-o :> 




A 
U 


<Viaeo£>egment id— Vkl > 


0.4 


A ATA1 
U.U[Uj 


<keyFrame> ./. ./video/Scotland j pg</keyFrame> 




A 1 


<Annotation>My trip in Scotland</Annotation> 




A O 

U.2 


<MediaTime timeunitF^PTlTOOF 1 ^ 


0.2.2 


A O A 


<btart>0</Start> 




0.2.1 


<Stop> 1 500</Stop> 






</MediaTime> 




a "jrm 
u.i[UJ 


< Videobegment id— VS2 > 


0.3[0].4 


A "5 TAT A TAT 
U.JJUJ.UjUJ 


<keyFrame>./../video/video_landscape/land scape 1 .jpg</keyFrame> 




0.3[0].0[1] 


<keyFrame>./../video/video_landscape/landscape2.jpg</keyFrame> 




A "5 TAT Am 
U.3[0J.O|2J 


<keyFrame>./../video/video_lanscape/landscape2.jpg</keyFrame> 




A O TAT "3 TAT 


<VideoSegment id= VS3 > 


0.3[0].3[0].4 


0.3(0].3(0].0[0j 






a f rm i tat i 

03[0].3j0j.l 


<keyFrame>./../video/vIdeo_landscape/forest.jpg</keyFrame> 




0.3[0].3[0].2 


<Annotation>forest of oaks</Annotation> 




0.3[0].3[0].2.0 


<MediaTime timeumt= PTIN30F > 


0.3[0].3[0].2.2 


0.3 [0]. 3 [0]. 2.1 


<Start>0</Start> 






<Stop>200</Stop> 






</MediaTime> 




0.3[0j.3[lJ 


</Video Segment> 




a i \fw ^rn Arm 

0.3[0].3[1].0[0] 


<VideoSegment td- VS4 > 


0.3[0].3[1].4 


0.3[0].3[l].l 






0.3[Q].3[1].2 


<keyFrame>./../video/video_Iandscape^eachjpg</keyFrame> 




0.3[0].3[1].2.0 


<Annotation>The north beach</Annotation> 




0.3[0].3[i].2.1 


<MediaTime timeunit= M PTlN30F"> 


0.3[0].3[1].2.2 




<Start>200</Start> 






<Stop>450</Stop> 






</MediaTime> 






</VideoSegment> 






</VideoSegment> 






</VideoSegment> 





ARRAY 1 
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In practice the received sequence is scanned, identification information by 

identification information, and the following algorithm is applied to update the hierarchical 

memory representation of the instance: 
Algorithm (I): 
step 4-1: 

current token = first identification information of the sequence 
current node = root of the hierarchical memory representation 
step 4-2: 

previous description element = description element corresponding to current 

node 

current description element = child of previous description element having 
current token as identification information 

step 4-3: does current node have a child node corresponding to the current 
description element? 

step 4-4: if current node has a child node corresponding to the current 
description element, go to step 4-8 

step 4-5 : if current node doesn't have a child node corresponding to the current 
description element, create such a child node, 

step 4-6: in case of multiple occurrences create brother node(s) of lower rank, 
if not already existing, 

step 4-7: if current token =last identification information of the received 
sequence, add the content to the node created at step 4-5 and go to step 4-8 

step 4-8: 

current token = next identification information 
current node = child node 
go to step 4-2. 

For example, at step 4-2, the current description element can be retrieved by 
using the following algorithm given in C-like code: 
Algorithm (2): 

Let instance _key be the sequence of token from the first identification information of the 

received sequence to the current identification of the received sequence. 

Let edtjtey be the corresponding table structuring key as found in the table 

Let prefix(key) be the largest prefix (n first tokens) of key that actually exists in the table. 

Let suffixfkey) be the last tokens of key so that key = pr efix (key) + suffix (key). 
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Let self_cont(key) be the self-containment key. 
while (preftx(instance_key) != instance_key ) 
{ 

instance key = self_cont(prefix{ instance _key ) ) + suffix( instance _key ); 

} 

edtjcey = instance Jcey ; 

Applying step by step algorithm (2) to the sequence 0.0. 1 . 1 .0 in the above 
described EXAMPLE 2 gives: 

instance Jcey = 0,0.1.1.0 

prefix(instance_key) = 0.0.1 

instance Jcey = self_cont( preflx(instancejcey) ) + suffix{ instance Jcey ) = 0.0 

+ 1.0 = 0.0.1.0 

prefix (instance Jcey) = 0.0.1 

instance Jcey = selfjcont( prefix(instancejcey) ) + suffix{ instance Jcey ) = 0.0 

+ 0 = 0.0.0 

Which leads finally to: 
edtjcey = 0.0.0 

which means that the current description element is Element 4. 

In case of non-self-contained hierarchies, the mapping between the table 
structuring key and the instance structuring key is straightforward. Indeed, one has simply to 
remove the indexes found in the instance structuring key to retrieve the corresponding table 
structuring key. In the above described EXAMPLE 1, a description element represented by 
the instance structuring key 0. 1 [5] is the fifth Element2 present in a globalElement 

In an advantageous embodiment of the invention the table structuring key and 
the instance structuring key are compacted as will now be described. Experiments have 
shown that such a compression of the structuring key leads to a significant gain regarding the 
size of the key while offering exactly the same functionality. 

The resulting keys are referred as compact key (in short CSK). In the simpler 
case (no self-containment), the CSK is the structuring key Element Declaration Table record 
number. 

First, we need to add a key to the current list of EDT fields by numbering the 
Element Declaration Table records. Applied on the above described EXAMPLE 2, this leads 
to: 

Table2bis: 
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Name Compact key Table structuring key Self-containment 

key 

GlobalElement 0 0 

Elementl 1 0.0 4- -i 

i 

5 Element4 2 0.0.0 

Elementl 3 0.0.1 0.0 

Element2 4 0 . 1 [] 

Element3 5 0.2 

Attribute 1 6 0.3 

10 Algorithm (3) is used to compute the CSK in the general case (with self- 

contained structures) from the instance structuring key : 
Algorithm (3): 

3 Let instance Jcey be the instance structuring key of a given description element. 
5 Let csjcey be the corresponding compact structuring key, 

1 5 Let prefix(key) be the largest prefix (n first tokens) of key that actually exists in the EDT. 
: y Let suffix(key) be the last tokens of key so that key = prefix(key)+suffix(key). 
% Let self_cont(key) be the self-containment key. 

f Let compact _form(key) be the corresponding compact form of key in the EDT. 
□ while {prefix{instancejcey) \~ instance_key ) 

fto { 

O cs_key = csjkey + compact Jbrm( preflx(instance_key) ) ; 

instance Jcey = self_cont{ prefix{ instance Jcey ) ) + suffix{ 

instance Jcey ); 
} 

25 csjkey = cs_key + compact Jorm( prefix(instancejcey) ) ; 

Example : We want to compute the CSK corresponding to the following structuring key: 
0.0.1.1.0 

Applying step by step the algorithm described above gives: 
instance Jcey = 0.0.1.1.0 
30 prefix (instance Jcey) = 0.0.1 

csjcey ~ 3 

instance Jcey = self_cont( preflx(instancejcey) ) + suffix{ instance Jcey ) = 0.0 

+ 1.0-0.0.1.0 

prefix(instancejcey) = 0.0.1 
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csjcey = 3.3 

instance Jcey = selfjcont( prefix(instancejcey) ) + sujfix{ instance Jcey ) = 0.0 

+ 0 = 0.0.0 

Which leads finally to: 
5 cs_key = 3.3.2 

In the above example, the element is not a multiple occurrence element for sake of expression 
simplicity. It is nevertheless to be noted that each token of the instance structuring key (resp. 
the instance CSK) might be indexed (resp. contain several indexes). 

The only purpose of the compact structuring key is to reduce the size of the 
10 stream. Therefore the instance compact structuring key is firstly decoded to its expanded 
form (instance structuring key) by the decoder before the above described decoding phase. 
Algorithm 4 given below returns the instance structuring key corresponding to a instance 
1% compact key: 
);z Algorithm (4) : 

fljl 5 Let resultNCKey be the expanded form of compactkey (result of the algorithm). 

% S Let compact Jcey be the instance compact structuring key of a given description element. 

VI4 Let current Jcey be a token of the instance compact structuring key compact Jcey. 

ju Let compact Jcey [i] be the i* token of compact key. 

! f Let size (compact Jcey) be the number of tokens of compact Jcey, 

%J>0 Let diffCode(keyl y key!) be the sub-key obtained by removing the common prefix of keyl 
2 and key 2 

Let NCKey(CKey) be the corresponding expanded form of the compact key CKey. 
Let self _cont (key) be the self-containment key of key. 

All indexes are first removed from compact Jcey and are put back at the end in the developed 
25 form of compact Jcey: 

current Jcey = compact Jcey [0] 
resultNCKey = NCKey (current Jcey) 
for (i=l; i<size(compactJcey); i++) 

{ 

30 previous Jcey = current Jcey ; 

current Jcey = compact Jcey [i] ; 
resultNCKey += " " + diffCode (NCKey (current Jcey), 
self_cont (previous Jcey)) ; 

} 
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Example : We want to generate the instance structuring key corresponding to the following 
CSK 3.3.2 

Applying step by step the algorithm described above gives : 
compact Jcey = 3.3.2 
current Jcey ~ 3 

resultNCKey = 0.0.1 (looking at the EDT) 
(i-D 

previous Jcey ~ 3 
current Jcey = 3 
selfjconi (previous Jcey) ~ 0.0 
NCKey (current Jcey) = 0.0.1 
diffCode(0$.\,0&) = "\" 
resultNCKey = resultNCKey +«." + "1" 
resultNCKey = 0.0. 1 . 1 
(i=2) 

previous Jcey = 3 
current Jcey = 2 

self jcont (previous Jcey) = 0.0 . ' 

NCKey (current Jcey) = 0.0.0 
c/#Corfe(0.0.0, 0.0 ) = "0" 
resultNCKey = resultNCKey + + "0" 
^resultNCKey = 0.0. 1.1.0 
end 

3.3.2 is thus the compact form of the instance structuring key 0,0.1.1.0 

An example of binary syntax will now be described. Fragments are part of a 
file having a header. The header of the file contains at least an identifier of the schema (either 
an MPEG-defined ID or a URL as proposed in M6142). 

Each fragment is composed of an instance compact structuring key K(DE0 (or an instance 
structuring key) and a description element value C(DEi) (also called content) as described on 
figure 4. The generic form of the instance structuring key is as follows : 

Key[ind][ind](.,.)[ind] .Key[md][ind](...)[ind] . (...), where each group Key[ind][ind](...)[ind] is called 
token. Tokens of an instance structuring key comprise at most one index. Tokens of instance 
compact structuring keys may comprise several indexes. All keys and indexes are integer 
values coded using a variable number of bytes. The whole structuring key is thus coded using 



PHFR000110 



20 05.10.2001 
a variable set of bytes, each of them being controlled by the 2 most significant bits with the 
following semantics : 



Control bits 


Semantics 


Bit7 


Bit6 


0 


0 


"New level" : The next byte represents the beginning of a new token. 


0 


1 


"Continues " : The next byte is to be interpreted as the following bits of the 
current key or index 


1 


0 


"Indexed" : The next byte is the beginning of the next index within the current 
token. 


1 


1 


"End" : The current byte is the last byte of the structuring key. 



Figure 4 also describes the generic format for encoding description element 
values. According to figure 4, before adding a data value D(DEi) to the binary file or stream, 
the size in bytes S(DEi) of the data block is coded. This aims at informing the decoder about 
the size of data to be decoded and guaranties an easy random access to data and fast stream 
parsing. Since certain primitive data types can imply a large amount of bytes (e.g. free text 
annotation or movie scripts), we propose to code the data size using a variable number of 
bytes. 

The length is thus coded by default using one byte, with the most significant 
bit being interpreted as follows : 



Bit7 


Semantics 


0 


"end" : The length coding is finished 


1 


"continues" : The length coding continues on the 
next byte 



Figure 5 gives an example of a binary encoding for the compact key 
« 0.1 [70] [1] ». Five bytes are needed to encode the compact key « 0.1 [70] [1] ». Each byte 
starts with two control bits. The six less significant bits are used to encode the value. The 
control bits of the first byte are '00' (new level). Its value bits are '000000' which is the 
binary representation of the first identification information of the sequence ( c 0'). The control 
bits of the second byte are ' 10' (indexed'). Its value bits are '00001 ' which is the binary 
representation of the second identification information of the sequence ('1'). The binary 
representation of the first index '70' is ' 10001 10' which contains more than six bits. 
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Therefore the encoding is done on two bytes: the third and the fourth bytes. The control bits 
of the third byte are '01 9 (continue). Its value bits are 4 0001 10' (less significant bits of the 
index to be encoded). The control bits of the fourth byte are * 10' (indexed). Its value bits are 
4 000001 ' (most significant bits of the index to be encoded). Finally the control bits of the 
fifth byte are 6 1 T (end). And its value bits are '000001 ' (binary value of the index to be 
encoded). 

Figure 6 gives an example of a binary encoding of the data size 575 (binary : 
10 001 1 1 1 1 1). The first byte is composed of the 7 less significant bits of the length value 
with the addition of a control bit specifying that another byte is required. The second byte 
contains the remaining bits with the "end" control bit. 

As already mentioned, a major advantage of the proposed coding scheme is to 
encode only the attributes and elements that contain a primitive type value, and skip the 
elements that are only structural containers (e.g. with a complex type). This is allowed given 
that the structure can be inferred at the decoder side using the Element Declaration Table. 
Example: 

Consider the following instance fragment (found in the core experiment test set) : 
<GenericDS> 
<MediaInformation> 
<MediaProfile> 
<MediaInstance> 
<InstanceLocator> 

<MediaURL>imgs/img00587_add3.jpg</MediaURL> 
</InstanceLocator> 
</MediaInstance> 
</MediaProfile> 
</MediaInformation> 
</GenericDS> 

In this case, only the MediaURL would be encoded (as a string) using a 
structuring key that allows the decoder to reconstruct the whole structure from the Element 
Declaration Table. The other container elements would not be transmitted. 

In the general case, all the elements which type is primitive (i.e. for which a 
binary representation is available in a standard way which ensures interoperability) shall be 
encoded. 
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Examples of such primitive types are the XML-schema built-in types (e.g. 
string, float, ,..) as well as the MPEG-7 specific basic types (e.g. unsignedlntl, unsignedlnt2 5 
MediaTime, Matrix, ...). 

Primitive types also include extended types that might include complex types 
5 in the following cases: 

there is no need for accessing randomly the embedded elements within the 
complex type structure. 

an efficient binary representation already exists. 

These criteria are certainly fulfilled in the case of descriptors as defined by the 
1 0 video and audio group of MPEG-7. Indeed, a compact binary representation has already been 
defined and should be used. Furthermore, there is (most of the time) no need for accessing 
the individual parts of the descriptors (they make sense as a whole). 

The efficiency (in terms of content compression) will increase with an 
g increasing number of primitive types (which are encoded in an optimal way), but so does the 
r?l 5 complexity of the decoder which is supposed to include the decoding methods for all the 

standard primitive types. 
%U ARRAY 2 below is an example of compact instance structuring key for the 

^ instance already used in ARRAY 1 . The compact instance structuring key associated to the 
£ description element <MediaTime timeUnit= 4 PT1N30F'> is 7[0].7[1].6. The binary 
*20 representation of this compact instance structuring key is 4 10-0001 1 1 00-000000 10-0001 1 1 
C 00-000001 11-000110'. The length ofthe content is encoded on 1 byte:0-0000111. And the 
value PT1N30F is converted from string characters to bytes using usual character coding. 
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Element 




Attribute 


Compact 


Instance 


Compact Key 


Key 








<?xml version-" 1.0" encoding="UTF-8 r, ?> 




0 


<VideoSegment id= M VSl"> 


8 


1[0] 


<keyFrame>./../video/Scotlandjpg</keyFrame> 




2 


<Annotation>My trip in Scotland</Annotation> 




3 


<MediaTime timeunit="PTlN30F"> 


6 


3 


<Start>0</Start> 




5 


<Stop>1500<Stop> 






</MediaTime> 




7[0] 


<VideoSegment id= M VS2"> 


7[0].8 


7[0].1[0] 


<keyFrame>./. J video/video_landscape/landscape 1 .j pg</keyFrame> 




7[0].1[1] 


<keyFrame>./. ./video/video_landscape/landscape2 jpg</keyFrame> 




7[0].1[2] 


<keyFrame>./. ,/video/video_lanscape/Iandscape2 .jpg</keyFrame> 




7[0].7[0] 


<VideoSegment id="VS3"> 


7[0].7[0].8 


7[0].7[0].1[0] 


<keyFrame>./../video/video__landscape/forest.jpg</keyFrame> 




7[0].7[0].2 


<Annotation>forest of oaks</Annotation> 




7[0].7[0].3 


<MediaTime timeunit="PTlN30F M > 


7[0].7[0].6 


7[0].7[0].4 


<Start>0<Start> 




7[0].7[0].5 


<Stop>200</Stop> 






</MediaTime> 






</VideoSegment> 




7[0].7[1] 


<VideoSegment id="VS4 n > 


7[0].7[1].8 


7[0].7[1].1[0] 


<keyFrame>./../video/video_landscape/beach.jpg</keyFrame> 




7[0].7[1].2 


<Annotation>The north beach</Annotation> 




7[0].7[1].3 


<MediaTime timeunit="PTlN30F"> 


7[0].7[1].6 


7[0].7[1].4 


<Start>200</Start> 




7[0].7[1].5 


<Stop>450</Stop> 






</MediaTime> 






</VideoSegment> 






</VideoSegment> 






<yVideoSegment> 





ARRAY 2 



